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Abstract 

Strategies  in  repeated  games  can  be  classified  as  to 
whether  or  not  they  use  memory  and/or  randomization.  We 
consider  Markov  decision  processes  and  2-player  graph 
games,  both  of  the  deterministic  and  probabilistic  varieties. 
We  characterize  when  memory  and/or  randomization  are 
required  for  winning  with  respect  to  various  classes  of  oj- 
regular  objectives,  noting  particularly  when  the  use  of  mem¬ 
ory  can  be  traded  for  the  use  of  randomization.  In  partic¬ 
ular,  we  show  that  Markov  decision  processes  allow  ran¬ 
domized  memoryless  optimal  strategies  for  all  Muller  ob¬ 
jectives.  Furthermore,  we  show  that  2-player  probabilistic 
graph  games  allow  randomized  memoryless  strategies  for 
winning  with  probability  1  those  Muller  objectives  which 
are  upward-closed.  Upward-closure  means  that  if  a  set  a  of 
infinitely  repeating  vertices  is  winning,  then  all  supersets  of 
a  are  also  winning. 


1  Introduction 

A  two-player  graph  game  is  played  on  a  directed  graph 
whose  vertices  are  partitioned  into  player- 1  vertices  and 
player-2  vertices.  The  two  players  move  a  token  along  the 
edges  of  the  graph.  At  player- 1  vertices,  the  first  player 
chooses  an  outgoing  edge,  and  at  player-2  vertices  the  sec¬ 
ond  player  moves  the  token  to  a  neighboring  vertex.  The 
outcome  of  the  game  is  an  infinite  path  through  the  graph. 
An  objective  for  a  player  can  be  specified  as  an  w-regular 
condition  on  the  outcome  of  the  game  [27,  23].  These  lo- 
regular  games  are  used  in  the  modeling  [1,  14,  1 1],  verifica¬ 
tion  [34,  12,  2,  20],  and  control  (synthesis)  [6,  3,  31,  29]  of 
state-based  systems,  where  the  vertices  represent  states  and 
the  players  represent  components  or  agents  of  a  system. 
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A  Strategy  for  a  player  is  a  recipe  that  describes  how 
the  player  chooses  a  move  whenever  it  is  her  turn.  Strate¬ 
gies  can  be  classified  as  follows.  A  pure  strategy  always 
chooses  a  particular  edge  to  extend  the  game.  In  contrast, 
a  randomized  strategy  may  choose  at  a  vertex  a  probabil¬ 
ity  distribution  over  the  outgoing  edges.  In  other  words,  a 
randomized  strategy  instructs  the  player  to  toss  a  coin  in 
order  to  decide  on  her  move.  Randomized  strategies  are 
not  helpful  to  win  the  game  with  certainty,  but  they  may 
be  useful  to  win  the  game  with  probability  1 .  To  formalize 
this,  notice  that  every  vertex  v  and  every  pair  (cr,  tt)  consist¬ 
ing  of  a  player- 1  strategy  a  and  a  player-2  strategy  tt  de¬ 
termines  (1)  a  set  Outcome(u,  CT,  tt)  of  possible  outcomes 
if  the  two  players  follow  the  strategies  a  and  tt  starting  the 
game  from  the  initial  vertex  v,  and  (2)  a  probability  distribu¬ 
tion  over  Outcome(u,  <T,  tt)  which  indicates  the  likelihood 
of  each  possible  outcome.  We  say  that  at  vertex  v  player- 1 
surely  wins  the  game  with  objective  $  if  there  is  a  player- 
1  strategy  a  such  that  for  all  player-2  strategies  tt  we  have 
Outcome(u,  ct,  tt)  C  $,  that  is,  every  possible  outcome  sat¬ 
isfies  $.  A  weaker  condition  is  that  player- 1  almost-surely 
wins  at  v  with  objective  $,  meaning  that  there  is  a  player- 
1  strategy  a  such  that  for  all  player-2  strategies  tt  the  set 
Outcome(u,  ct,  7r)\$  of  undesirable  possible  outcomes  has 
probability  0. 

Strategies  can  be  classified  also  according  to  their  mem¬ 
ory  requirements.  A  memoryless  strategy  depends  only  on 
the  current  position  of  the  token.  In  contrast,  a  memory 
strategy  may  depend  on  the  path  the  token  has  taken  to  ob¬ 
tain  its  current  position.  It  is  well-know  that  there  are  ui- 
regular  objectives  which  can  be  surely  won  using  a  mem¬ 
ory  strategy,  but  cannot  be  surely  won  using  a  memoryless 
strategy.  Here  is  a  simple  example;  there  are  three  vertices, 
uo,  vi,  and  V2',  at  vq  player  1  moves  the  token  to  either  vi 
or  V2,  and  at  both  of  these  vertices,  player  2  has  no  choice 
but  to  move  the  token  back  to  Vq.  The  objective  to  visit  both 
Vi  and  V2  infinitely  often  cannot  be  won  by  player  1  with¬ 
out  using  memory;  for  instance,  a  winning  player- 1  strategy 
may  alternate  the  two  moves  uq  — >■  vi  and  uq  — >■  U2.  Note, 
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however,  that  in  this  game  player  1  does  have  a  randomized 
memoryless  strategy  to  almost  surely  win,  such  as  the  strat¬ 
egy  that  always  chooses  the  successor  of  Vq  uniformly  at 
random.  In  other  words,  player  1  can  trade  memory  against 
a  random  coin.  We  systematically  study  this  trade-off  for 
all  w-regular  objectives. 

The  results  are  categorized  according  to  the  form  of  the 
game  graph  and  the  form  of  the  winning  condition.  For  win¬ 
ning  conditions,  we  use  the  classical  classes  of  parity,  Ra¬ 
bin,  Streett,  and  Muller  objectives  [33].  For  game  graphs, 
we  distinguish  graphs  without  probabilistic  vertices,  which 
are  the  graphs  described  above,  and  graphs  that  may  con¬ 
tain  also  probabilistic  vertices.  At  a  probabilistic  vertex, 
the  token  is  moved  according  to  a  fixed  probability  distri¬ 
bution  over  the  outgoing  edges,  that  is,  neither  of  the  two 
players  can  choose  the  successor  vertex.  Accordingly,  we 
classify  the  game  graphs  into  1-player  graphs  (only  player- 
1  vertices),  1 1/2-p^ayer  graphs  (player- 1  and  probabilistic 
vertices),  2-player  graphs  (player- 1  and  player-2  vertices), 
and  2^l2-player  graphs  (player-1,  player-2,  and  probabilis¬ 
tic  vertices).  Notice  that  1-player  graphs  are  transition  sys¬ 
tems,  and  1 1/2-player  graphs  are  Markov  decision  processes 
(MDPs).  Instead  of  almost-sure  winning,  we  consider  the 
more  general  condition  of  optimality.  For  a  vertex  v,  a 
player- 1  strategy  ct,  and  a  player-2  strategy  tt,  let  Pr^’’^($) 
by  the  probability  of  the  set  Outcome(t;,  ct,  tt)  fl  $  of  de¬ 
sirable  possible  outcomes.  A  player- 1  strategy  a  is  opti¬ 
mal  at  V  for  $  if  inf„  Pr^’’"($)  >  inf„  Pr^' for  all 
player- 1  strategies  ct'.  It  can  be  shown  that  player- 1  al¬ 
most  surely  wins  at  v  for  $  iff  she  has  a  strategy  a  with 
inf,Pr:’'^($)  =  l. 

For  Rabin  objectives,  it  is  known  that  pure  memoryless 
strategies  suffice  for  the  sure  winning  of  2-player  games 
[16,  15],  and  for  the  more  special  case  of  parity  objectives, 
it  is  known  that  there  always  exist  optimal  strategies  in  2 1/2- 
player  games  which  are  both  pure  and  memoryless  [26,  5]. 
At  the  other  extreme,  Streett  games  are  known  to  require 
memory  for  sure  winning  even  in  the  1 -player  case  (cf.  the 
above  example),  and  it  is  easy  to  see  that  they  also  re¬ 
quire  memory  for  almost-sure  winning  in  the  2-player  case 
(cf.  Example  3).  However,  for  1 1/2-player  Streett  games, 
and  more  generally,  for  all  1 1/2-player  Muller  games,  we 
show  that  no  memory  is  required  for  optimal  strategies  if 
randomization  is  available  (Theorem  9).  In  other  words,  in 
MDPs  the  optimal  value  can  be  obtained  without  memory 
for  every  objective  which  cannot  distinguish  between  two 
paths  that  visit  the  same  vertices  infinitely  often.  Further¬ 
more,  we  show  that  if  the  objective  is  Rabin,  then  optimal¬ 
ity  in  MDPs  can  be  achieved  by  strategies  that  are  both  pure 
and  memoryless  (Theorem  8). 

We  then  take  a  closer  look  at  the  general  case  of  2 1/2- 
player  w-regular  games.  We  define  a  Muller  objective  $ 
to  be  upward-closed  if  for  every  infinite  path  r  G  $,  if 


every  vertex  that  occurs  infinitely  often  in  r  also  occurs 
infinitely  often  in  r',  then  r'  G  $.  For  example,  every 
generalized  Biichi  objective  is  upward-closed.  We  prove 
that  memoryless  strategies  suffice  for  the  almost-sure  win¬ 
ning  of  upward-closed  2 1/2-player  games  (Theorem  11).  If 
randomization  is  not  used,  then  upward-closed  objectives 
(such  as  the  generalized  Biichi  objective  in  the  above  ex¬ 
ample)  may  require  memory  for  almost-sure  winning;  thus, 
the  upward-closed  games  allow  us  to  trade  memory  for  ran¬ 
domization.  Indeed,  we  give  an  example  of  2-player  Muller 
games  with  n  vertices  where  sure  winning  requires  0(n) 
memory  but  almost-sure  winning  can  be  achieved  without 
memory.  Moreover,  there  is  a  game  such  that,  if  a  Muller 
objective  is  not  upward-closed,  then  randomized  memory¬ 
less  strategies  are  no  better  than  pure  memoryless  strategies 
for  almost-sure  winning,  and  they  are  not  as  powerful  as 
strategies  with  memory.  This  shows  that  the  upward-closed 
Muller  games  are  the  most  general  games  with  w-regular 
objectives  where  memory  can  be  traded  for  randomization. 

2  Preliminaries 

Game  graphs.  A  turn-based  probabilistic  game  graph 
(2  ^l2-player  game  graph)  G  =  ((V,  E) ,  Vb ,  ^1 ,  ,  p)  con¬ 

sists  of  a  directed  graph  {V,E),  a  partition  Vo,  Vi,  V2  of 
the  vertex  set  V,  and  a  probabilistic  transition  function  p: 
Vq  — >  E>{V),  where  X’(y)  denotes  the  set  of  probability 
distributions  over  the  vertex  set  V.  The  vertices  in  Vi  are 
the  player-1  vertices,  where  player  1  decides  the  successor 
vertex;  the  vertices  in  V2  are  the  player-2  vertices,  where 
player  2  decides  the  successor  vertex;  and  the  vertices  in  Vq 
are  the  probabilistic  vertices,  where  the  successor  vertex  is 
chosen  according  to  the  probabilistic  transition  function  p. 
We  assume  that,  for  u  £  Vo  and  v  £  V,  we  have  {u,v)  £  E 
iffp(M)(t;)  >  0,  and  we  often  write  p(m,  t;)  for p(M)(t;).  For 
technical  convenience  we  assume  that  in  {V,  E)  every  ver¬ 
tex  has  at  least  one  outgoing  edge,  and  we  write  v  £  E{u) 
for  {u,v)  £  E. 

An  infinite  path,  or  play,  of  the  game  graph  G  is 
an  infinite  sequence  {vo,Vi,V2,  ■  ■  ■)  of  vertices  such  that 
{vk,Vk+-\.)  G  E  for  all  fc  G  N.  We  write  Q  for  the  set  of 
all  plays,  and  for  every  vertex  v  £  V  we  write  for  the 
set  of  plays  that  start  from  the  vertex  v.  A  set  U  C  y  of 
vertices  is  called  p-closed  if  for  every  u  £  U  D  Vo,  we  have 
{u,  v)  £  E  implies  v  £  U.  A  p-closed  subset  of  V  induces 
a  subgame  graph  of  G,  indicated  hy  G  \  U ,  if  for  every 
vertex  u  £U  tl  (Vi  UV2)  there  is  a  vertex  v  £  U  such  that 
{u,v)  £  E. 

The  turn-based  deterministic  game  graphs  (2-player 
game  graphs)  are  the  special  case  of  the  2  Y2-player  game 
graphs  with  Vq  =  0.  The  Markov  decision  processes  (1  '^12- 
player  game  graphs)  are  the  special  case  of  the  2  Y2-player 
game  graphs  with  y2  =  0  or  y  =  0.  We  refer  to  the 
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MDPs  with  y2  =  0  as  player-1  MDPs,  and  to  the  MDPs 
with  Vi  =  0  as  player-2  MDPs.  A  game  graph  which  is 
both  deterministic  and  an  MDP  is  called  a  transition  sys¬ 
tem  {1-player  game  graph):  a  player-1  transition  system  has 
only  player- 1  vertices;  a  player-2  transition  system  has  only 
player-2  vertices. 

Strategies.  A  strategy  for  player  1  is  a  function  a:  V*  ■ 
Vi  — >■  T>{y)  that  assigns  a  probability  distribution  to  every 
finite  sequence  w  £  V*  ■  Vi  of  vertices,  which  represents 
the  history  of  the  play  so  far.  Player  1  follows  the  strategy  a 
if  in  each  move,  given  that  the  current  history  of  the  play 
is  w,  she  chooses  the  next  vertex  according  to  the  prob¬ 
ability  distribution  cr{w).  A  strategy  must  prescribe  only 
available  moves,  i.e.,  for  all  tiJ  G  V*,  v  G  Vi,  and  u  €  V, 
if  cr{w  ■  v){u)  >  0,  then  {v,u)  G  E.  The  strategies  for 
player  2  are  defined  analogously.  We  denote  by  S  and  H 
the  set  of  all  strategies  for  player  1  and  player  2,  respec¬ 
tively.  Note  that  for  player- 1  MDPs  the  set  H  is  a  singleton, 

i.e.,  player  2  has  only  a  single  trivial  strategy. 

Once  a  starting  vertex  v  €  V  and  strategies  ct  G  S 
and  TT  G  n  for  the  two  players  are  fixed,  the  outcome 
of  the  game  is  a  random  path  for  which  the  prob¬ 
abilities  of  events  are  uniquely  defined,  where  an  event 
^  C  is  a  measurable  set  of  paths.  Given  strategies 
a  for  player  1  and  tt  for  player  2,  a  play  {vo,vi,V2,  ■  ■  ■) 
is  feasible  if  for  every  fc  G  N  the  following  three  condi¬ 
tions  hold;  (1)  if  Vk  G  Vq,  then  {vk,Vk+i)  G  E',  (2)  if 
Vk  G  Vi,  then  a{vo,vi, . . .  ,Vk){vk-^i)  >  0;  and  (3)  if 
Vk  G  V2  then  7r(t;o,fi, . . .  ,ffc)(t;fc+i)  >  0.  Given  strate¬ 
gies  a  G  S  and  tt  G  11,  and  a  vertex  v,  we  denote  by 
Outcome(t;,  ct,  tt)  C  the  set  of  feasible  plays  that  start 
from  V  given  strategies  a  and  tt.  For  a  vertex  v  €  V  and 
an  event  A  C  we  write  Pr^’’^(^)  for  the  probability 
that  a  path  belongs  to  A  if  the  game  starts  from  the  vertex  v 
and  the  players  follow  the  strategies  a  and  tt,  respectively. 
In  the  context  of  player- 1  MDPs  we  often  omit  the  argu¬ 
ment  Tt,  because  11  is  a  singleton  set. 

Objectives.  Objectives  for  the  players  in  nonterminating 
games  are  specified  by  providing  the  set  of  winning  plays 
$  C  0  for  each  player.  In  this  paper  we  study  only  zero- 
sum  games  [30,  18],  where  the  objectives  of  the  two  players 
are  strictly  competitive.  In  other  words,  it  is  implicit  that 
if  the  objective  of  one  player  is  $,  then  the  objective  of  the 
other  player  is  fl\$.  Given  a  game  graph  G  and  an  objective 
$  C  fl,  we  write  {G,  $)  for  the  game  played  on  the  graph 
G  with  the  objective  $  for  player  1. 

A  general  class  of  objectives  are  the  Borel  objec¬ 
tives  [21].  A  Borel  objective  $  C  V‘^  is  a  Borel  set  in  the 
Cantor  topology  on  V‘^ .  In  this  paper  we  consider  oj-regular 
objectives  [33],  which  lie  in  the  first  2 1/2  levels  of  the  Borel 
hierarchy  (i.e.,  in  the  intersection  of  S3  and  Ha).  The  lo- 
regular  objectives,  and  subclasses  thereof,  can  be  specified 


in  the  following  forms. 

For  a  play  lo  =  (r’o ,  f  1 ,  ^2 ,  ■  ■  ■)  G  fl,  we  define  Inf  (w)  = 
{t;  G  y  I  r;*  =  r;  for  infinitely  many  A:  >  0}  to  be  the  set 
of  states  that  occur  infinitely  often  in  lo.  We  use  colors  to 
define  objectives  independent  of  game  graphs.  For  a  set  G 
of  colors,  we  write  !•];  G  — >  2^^  for  a  function  that  maps 
each  color  to  a  set  of  vertices.  Inversely,  given  a  set  (7  C  y 
of  states,  we  write  \U]  =  {c  G  C  |  |c]  Cl  C/  7^  0}  for  the  set 
of  colors  that  occur  in  U. 

1.  Reachability  and  safety  objectives.  Given  a  color  c, 
the  reachability  objective  requires  that  some  vertex  of 
color  c  be  visited.  Let  T  =  |c]  be  the  set  of  so- 
called  target  vertices.  Formally,  we  write  Reach(T)  = 

■  ■  ■)  G  fl  I  ffc  G  T  for  some  fc  >  0} 
for  the  set  of  winning  plays.  Given  c,  the  safety  ob¬ 
jective  requires  that  only  vertices  of  color  c  be  vis¬ 
ited.  Let  F  =  |c]  be  the  set  of  so-called  safe  ver¬ 
tices.  Formally,  the  set  of  winning  plays  is  Safe(F)  = 
v\,V2, . . .)  G  I  Wfc  G  F  for  all  A:  >  0}. 

2.  BUchi  and  generalized  BUchi  objectives.  Given  a 
color  c,  the  Biichi  objective  requires  some  vertex  of 
color  c  be  visited  infintely  often.  Let  B  =  |c]  be  the  set 
of  so-called  BUchi  vertices.  Formally,  the  set  of  win¬ 
ning  plays  is  Buchi(i?)  =  {w  G  12  I  Inf(a;)ni?  7^  0}. 
Given  a  set  (7  =  {ci , . . . ,  c™  }  of  colors,  the  general¬ 
ized  Biichi  objective  requires  that  all  m  Biichi  objec¬ 
tives  in  G  be  satisfied.  Formally,  the  set  of  winning 
plays  is  ni<i<™  Biichidci]). 

3.  Rabin,  parity,  and  Streett  objectives.  Given  a  set 
P  =  {(ei,  /i),  ■  ■  ■ ,  (e™,  /to)}  of  pairs  of  colors,  the 
Rabin  objective  requires  that  for  some  1  <  A  <  m,  all 
vertices  of  color  be  visited  finitely  often  and  some 
vertex  of  color  /j  be  visited  infinitely  often.  Let  R  = 
{ {El  ,Fi),...,  (Fto  ,  Fto  ) }  be  the  corresponding  set  of 
so-called  Rabin  pairs,  where  Ei  =  |ej]  and  Fi  =  |/j] 
for  all  1  <  A  <  m.  Formally,  the  set  of  winning  plays  is 
Rabin(i?)  =  {wg12|  31<A<m.  (Inf(w)  f)  Ei  = 
0  A  Inf(a;)  f)  Ei  7I  0)}.  The  parity  (or  Rabin- 
chain)  objectives  are  the  special  case  of  Rabin  objec¬ 
tives  where  Fi  C  Fi  C  ■  ■  ■  C  Em  C  Fm .  Given  P, 
the  Streett  objective  requires  that  for  each  1  <  A  <  m, 
if  some  vertex  of  color  fi  is  visited  infinitely  often, 
then  some  vertex  of  color  is  visited  infinitely  often. 
Formally,  for  the  set  5  =  {(Fi,  Fi), . . . ,  (Fto,  Fto)} 
of  so-called  Streett  pairs,  the  set  of  winning  plays  is 
Streett(5)  =  {wGfl|  Vl<i<m.  (Inf(w)  f\Eif^ 
0  V  Inf  (w)  nFj  =  0)}.  Note  that  the  Rabin  and  Streett 
objectives  are  dual. 

4.  Muller  and  upward-closed  objectives.  Given  a  set  G  of 
colors,  and  a  set  F  C  2^  of  subsets  of  the  colors,  the 
Muller  objective  requires  that  the  set  of  colors  that  ap¬ 
pear  infinitely  often  in  a  play  is  exactly  one  of  the  sets 
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in  r.  Formally,  for  the  set  Mr  =  {U  C  V  |  \U]  £  F} 
of  so-called  Muller  sets  of  vertices,  the  set  of  winning 
plays  is  Muller(Mr)  =  {w  €  fl  |  Inf(a;)  €  Mr}. 
We  call  r  a  (game  graph  independent)  specification  of 
the  objective  Muller(Mr),  because  F  does  not  refer  to 
the  vertex  names  of  G.  The  specification  F  is  upward- 
closed  if  for  all  a  C  /3  C  C,  if  a  €  F,  then  /3  G  F. 

The  generalized  Biichi  objectives,  Rabin  objectives,  and 
Strett  objectives  are  special  cases  of  Muller  objectives.  In 
particular,  all  Biichi  and  generalized  Biichi  objectives  are 
upward-closed.  Moreover,  reachability  and  safety  objec¬ 
tives  can  be  turned  into  Biichi  objectives  on  slightly  modi¬ 
fied  game  graphs.  However,  a  parity,  Rabin,  or  Streett  ob¬ 
jective  need  not  be  upward-closed. 

We  commonly  use  terminology  like  the  following:  a 
2^l2-player  Muller  game  (G, Muller (Mp))  consists  of  a 
2Y2-player  game  graph  G  and  a  Muller  objective  for 
player  1,  where  Mp  is  a  set  of  Muller  sets. 

Values  of  a  game  and  optimal  strategies.  Given  objec¬ 
tives  $  for  player  1  and  fl  \  $  for  player  2,  we  define  the 
value  functions  and  for  the  players  1  and  2, 

respectively,  as  follows: 

=  supinfPrr($) 

jr€n 

=  supinf  Prr(H\$) 

A  strategy  a  for  player  1  is  optimal  from  vertex  v  for  objec¬ 
tive  $  if  =  inf„enPC''($).  The  optimal 

strategies  for  player  2  are  defined  analogously. 

Theorem  1  (Quantitative  determinacy  [24]).  For  all 

2^l2-player  game  graphs,  all  Borel  objectives  and  all 
vertices  v. 

Every  cj-regular  objective  can  be  expressed  as  a  parity 
objective  [28,  33].  The  existence  of  optimal  strategies  for 
2  Y2-player  games  with  parity  objectives  follows  from  [26, 
5].  This  gives  the  following  theorem. 

Theorem  2  (Optimal  strategies).  For  all  2  ^/2-player 
game  graphs  with  Muller  objectives,  optimal  strategies 
exist  for  both  players. 

Sure  and  almost-sure  winning  strategies.  Given  an  objec¬ 
tive  $,  a  strategy  ct  is  a  sure  winning  strategy  for  player  1 
from  a  vertex  v  if  for  every  strategy  tt  of  player  2  we 
have  Outcome(t;,  CT,  Tt)  C  $.  Similarly,  a  strategy  ct  is  an 
almost-sure  winning  strategy  for  player  1  from  a  vertex  v  for 
the  objective  $  if  for  every  strategy  tt  of  player  2  we  have 
Pr^’’^($)  =  1.  The  sure  and  almost-sure  winning  strategies 
for  player  2  are  defined  analogously.  Given  an  objective  $, 


Figure  1.  An  MDP  with  a  reachabiiity  objec¬ 
tive. 

the  sure  winning  set  ((1))  ($)  for  player  1  is  the  set  of 

vertices  from  which  player  1  has  a  sure  winning  strategy. 
The  almost-sure  winning  set  {{i-)) aimosti^)  player  1  is 
the  set  of  vertices  from  which  player  1  has  an  almost-sure 
winning  strategy.  The  sure  winning  set  {{2)) \  $)  and 
the  almost-sure  winning  set  ((2))  \  pl^yer  2 

are  defined  analogously.  It  follows  from  the  definitions  that 
for  all  21/2-player  game  graphs  and  all  objectives  $,  we 
have  ((1))_($)  C  and  ((2)),_(H  \  $)  C 

Computing  sure  winning  and  almost-sure  winning  sets 
and  strategies  is  referred  to  as  the  qualitative  analysis 
of  2Y2-player  games.  It  follows  from  Theorem  2  that 

G  P  I  ((1)).„,($)Y)  =  1}-  The  fol¬ 
lowing  result  is  the  classical  determinacy  result  for  2-player 
deterministic  games. 

Theorem  3  (Qualitative  determinacy  [25]).  For  all  2- 

player  game  graphs  and  all  Borel  objectives  we  have 

{{nsurei^)^P))surei^\^)=V-, 

The  following  example  shows  that  Theorem  3  cannot  be 
extended  to  1  Y2-player  and  2  Y2-player  games. 

Example  1  Consider  the  MDP  with  a  reachability  objec¬ 
tive  shown  in  Fig.  1 .  In  all  our  figures  we  use  □  to  denote 
player- 1  vertices,  O  to  denote  player-2  vertices,  and  Q  lo 
denote  probabilistic  vertices.  The  objective  $  of  player  1 
is  to  reach  the  vertex  V2.  Given  the  strategy  ct  that  chooses 
fo  vi  at  vertex  iiq.  the  target  V2  is  reached  with  proba¬ 
bility  1 .  However,  there  is  an  infinite  paths  that  is  consistent 
with  the  player- 1  strategy  ct  but  only  visits  the  vertices  Vq 
andvi.  Hence,  ((1)) ,„„($)  =  {^^2}  and  = 

{t;o ,  fi ,  'f2  } .  This  shows  that  in  general  for  MDPs  and  2  Y2- 
player  games  ((1))  ,„„($)  C  ((1)) 

almost  ($).■ 

3  Special  Families  of  Strategies 

Pure,  finite-memory,  and  memoryless  strategies.  We 

classify  strategies  according  to  their  use  of  randomization 
and  memory.  The  strategies  that  do  not  use  randomiza¬ 
tion  are  called  pure.  A  player- 1  strategy  ct  is  pure  if  for 
all  w  £  V*  and  v  £  Vi,  there  is  a  vertex  u  £  V  such  that 
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a{w  ■  v){u)  =  1.  The  pure  strategies  for  player  2  are  de¬ 
fined  analogously.  We  denote  by  and  11^  the  sets  of 
pure  strategies  for  player  1  and  player  2,  respectively.  A 
strategy  that  is  not  necessarily  pure  is  called  randomized. 

A  strategy  is  finite-memory  if  it  depends  on  the  current 
vertex  and  on  a  finite  number  of  bits  from  the  history  of 
the  play  so  far.  We  denote  by  the  set  of  finite-memory 
strategies  for  player  1,  and  by  the  set  of  pure  finite- 
memory  strategies;  that  is,  fl  S-^.  A  memo¬ 

ryless  strategy  does  not  depend  on  the  history  but  only  on 
the  current  vertex.  A  memoryless  strategy  for  player  1  can 
be  represented  as  function  a:  Vi  — >  'D{V)  such  that  for  all 

V  ^Vi  andw  G  V,  if  a{v){u)  >  0,  then  {v,u)  G  E.  A  pure 
memoryless  strategy  is  a  pure  strategy  that  is  memoryless. 
A  pure  memoryless  strategy  for  player  1  can  be  represented 
as  a  function  a:  Vi  ^  V  such  that  {v,a{v))  G  E  for  all 

V  €  Vi-  We  denote  by  the  set  of  memoryless  strategies 

for  player  1,  and  by  the  set  of  pure  memoryless  strate¬ 
gies;  that  is,  fl  Analogously  we  define  the 

corresponding  strategy  families  for  player  2. 

Given  a  strategy  ct  G  S  for  player  1,  we  write  for 
the  game  played  on  the  graph  G  under  the  constraint  that 
player  1  follows  the  strategy  a.  The  corresponding  defi¬ 
nition  for  a  player-2  strategy  is  analogous.  Observe  that 
given  a  2  Y2-player  game  graph  G  and  a  memoryless  player- 
1  strategy  ct,  the  result  G^  is  a  player-2  MDP.  Similarly,  for 
a  player- 1  MDP  G  and  a  memoryless  player- 1  strategy  ct, 
the  result  Ga  is  a  Markov  chain.  Hence,  if  G  is  a  2^/2- 
player  game  graph  and  the  two  players  follow  given  memo¬ 
ryless  strategies  ct  and  tt,  the  result  Ga,-K  is  a  Markov  chain. 
These  observations  will  be  useful  in  the  analysis  of  2 1/2- 
player  games. 

Sufficiency  of  a  family  of  strategies.  Let  C  G 

{P,  M,  F,  PM,  PF}  and  consider  the  family  S*'  of  special 
strategies  for  player  1.  We  say  that  the  family  suffices 
with  respect  to  an  objective  $  on  a  class  Q  of  game  graphs 
for 

•  sure  winning  if  for  every  game  graph  G  €  Q,  for  every 

vertex  v  G  (^)  Ih^re  is  a  player- 1  strategy  ct  G 

Yf  such  that  for  every  player-2  strategy  tt  G  11  we 
have  Outcome(t;,  CT,  Tt)  C  $; 

•  almost-sure  winning  if  for  every  game  graph  G  €  Q, 
for  every  vertex  v  G  {{^)) almost  there  is  ^  player-1 
strategy  ct  G  such  that  for  every  player-2  strategy 
TT  G  n  we  have  Pr"’’^($)  =  1; 

•  optimality  if  for  every  game  graph  G  €  Q,  for  every 
vertex  v  €  V  there  is  a  player- 1  strategy  ct  G  S*'  such 

that  =  inf-en  PC"  W- 

For  sure  winning,  the  1 1/2-player  and  2 1/2-player  games 
coincide  with  2-player  deterministic  games  where  the  ran¬ 
dom  player  (who  chooses  the  successor  at  the  probabilistic 


vertices)  is  interpreted  as  an  adversary,  i.e.,  as  player  2.  This 
is  formalized  by  the  proposition  below. 

Proposition  1  If  a  family  of  strategies  suffices  for  sure 
winning  with  respect  to  an  objective  $  on  all  2-player  game 
graphs,  then  the  family  Y^  suffices  for  sure  winning  with 
respect  to  $  also  on  all  1 1/2-player  and  2  ^l2-player  game 
graphs. 

The  following  proposition  states  that  randomization  is 
not  necessary  for  sure  winning. 

Proposition  2  If  a  family  Y^  of  strategies  suffices  for  sure 
winning  with  respect  to  a  Borel  objective  $  on  all  2^  12- 
player  game  graphs,  then  the  family  nS-^  of  pure  strate¬ 
gies  suffices  for  sure  winning  with  respect  to  $  on  all  2^  12- 
player  game  graphs. 

The  following  result  is  the  classical  determinacy  result 
for  2-player  deterministic  graph  games. 

Theorem  4  (Pure  and  finite-memory  strategies). 

1.  f25]  The  family  Y^  of  pure  strategies  suffices  for 
sure  winning  with  respect  to  all  Borel  objectives  on  2- 
player  game  graphs. 

2.  fl9]  The  family  Y^^  of  pure  finite-memory  strategies 
suffices  for  sure  winning  with  respect  to  all  Muller  ob¬ 
jectives  on  2-player  game  graphs. 

It  is  easy  to  see  that  for  any  2-player  game  a  sure  winning 
strategy  is  also  an  almost-sure  winning  strategy.  Hence  the 
almost-sure  winning  strategies  need  not  be  more  complex 
than  the  sure  winning  strategies  in  2-player  games.  This  is 
formalized  by  the  proposition  below. 

Proposition  3  If  a  family  Y^  of  strategies  suffices  for  sure 
winning  with  respect  to  a  Borel  objective  $  on  all  2-player 
game  graphs,  then  the  family  Y^  suffices  also  for  optimality 
with  respect  to  $  on  all  2-player  game  graphs. 

4  Reachability  and  Safety  Objectives 

Pure  memoryless  strategies  suffice  for  sure  winning  and 
optimality  (and  therefore  for  almost-sure  winning)  with  re¬ 
spect  to  reachability  and  safety  objectives. 

Theorem  5 

7.  The  family  Y^^  of  pure  memoryless  strategies  suffices 
for  sure  winning  with  respect  to  reachability  and  safety 
objectives  on  2  ^/2-player  game  graphs. 

2.  [7]  The  family  Y^^  of  pure  memoryless  strategies  suf¬ 
fices  for  optimality  with  respect  to  reachability  and 
safety  objectives  on  2  '^l2-player  game  graphs. 
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Theorem  5(1)  for  2-player  games  is  classical.  It  is  an 
easy  consequence  of  the  alternating  reachability  analysis  of 
And-Or  graphs;  see  [33]  for  details.  Due  to  Proposition  1, 
the  result  carries  over  to  21/2-player  games.  Theorem  5(2) 
follows  from  the  results  of  [7].  However,  the  proof  given 
there  is  analytical;  it  analyzes  the  behavior  of  discounted 
games  as  the  discount  factor  converges  to  1 .  As  in  the  fol¬ 
lowing  sections  we  will  make  frequent  use  of  this  result 
for  MDPs,  we  provide  here  an  elementary  proof  that  pure 
memoryless  strategies  suffice  for  optimality  with  respect  to 
reachability  objectives  on  MDPs.  Our  proof  uses  only  facts 
from  graph  theory  and  matrix  algebra. 

Consider  a  player- 1  MDP  G  =  {{V,E),Vo,Vi,V2,p) 
(where  V2  =  0),  together  with  a  set  T  C  y  of  target 
vertices.  Let  Ti  =  T,  and  let  Tq  C  y  be  the  set  of 
vertices  that  cannot  reach  Ti  in  the  graph  {V,  E) ;  let  also 
U  =  V  \  (To  U  Ti ) .  From  Tq  U  Ti ,  all  strategies  are  optimal 
with  respect  to  the  objective  Reach(T),  so  we  can  fix  a  pure 
memoryless  strategy  on  Tq  U  Ti  arbitrarily.  Consider  the 
following  fixpoint  equation  for  x:  y  — >  [0, 1],  where  for  all 
v£V: 


x{v) 


'0 

1 

maXueE{v)  x{u) 


if  u  G  To; 
if  u  G  Ti; 
if  u  G  y  \  T; 
if  u  G  Vb  \  T. 


(1) 


This  system  of  equations  in  general  has  many  fixpoints, 
and  it  is  well-known  that  the  least  fixpoint  x*  equals 
((l))-!)otReach(T);  see,  e.g.,  [13].  For  u  G  (7  Cl  y,  define 
the  set  of  optimal  successors  of  v  by  A{v)  =  {w  G  E{v)  \ 
x*{u)  =  x*{v)}.  Clearly,  an  optimal  strategy  must  select 
only  optimal  successors  of  vertices  in  [7  Cl  Ci.  Thus,  we  cut 
from  the  MDP  all  the  edges  {v,u)  G  E  with  u  &Vxr\U  and 
u  ^  A{v).  It  is  immediate  to  check  that  x*  is  still  a  fixpoint 
of  (1)  in  the  resulting  MDP. 

To  obtain  a  memoryless  strategy,  we  can  choose  all  opti¬ 
mal  successors  of  vertices  in  [7  fl  Ci  uniformly  at  random. 
To  obtain  a  memoryless  pure  strategy,  we  observe  that  if  a 
vertex  v  G  C7  Cl  Ci  has  multiple  optimal  successors,  i.e., 
|A(u)|  >  1,  and  we  cut  an  edge  (u,  u)  with  u  G  A{v),  then 
X*  is  still  a  fixpoint  of  (1)  in  the  resulting  MDP.  However, 
we  cannot  arbitrarily  fix  one  optimal  successor  for  each  ver¬ 
tex  in  (7n y  and  cut  the  edges  to  all  other  successors:  doing 
so  could  create  new  fixpoints  below  x* .  This  occurs,  for  in¬ 
stance,  whenever  there  are  mutually  reachable  vertices  with 
equal  x* ,  and  the  selected  successors  create  a  cycle  that  pre¬ 
vents  reaching  T.  Our  goal  is  to  pick  optimal  successors, 
and  cut  the  edges  to  other  successors,  so  that  x*  is  the  only 
fixpoint  of  (1)  in  the  resulting  MDP.  This  will  guarantee  that 
X*  =  ((l)).i,a(Reach(T)  for  the  resulting  pure  memoryless 
strategy. 

To  ensure  the  uniqueness  of  the  fixpoint,  we  cut  edges 
from  y  n  [7  while  maintaining  the  invariant  that  every  ver¬ 


tex  in  U  can  reach  Ti  in  the  graph  (y \To,  E).  Note  that  this 
invariant  holds  initially  by  the  definition  of  To .  Let  E'  C  E 
be  a  subset  of  edges  that  consists  of  shortest  paths  from  U 
to  T  such  that  every  vertex  has  only  one  outgoing  edge,  i.e., 
for  all  V,  Ui,U2  €  V,  if  {v,  Ui),  {v,  U2)  G  T',  then  Ui  =  U2- 
Then,  prune  from  player- 1  vertices  all  edges  that  are  not 
in  T';  precisely,  for  all  u  G  C7  fl  Li  and  (u,  m)  G  E,  keep 
(UjW)  if  {v,u)  G  E',  and  prune  it  otherwise.  The  MDP 
corresponds  thus  to  a  Markov  chain.  We  define  the  transi¬ 
tion  probability  matrix  \Pv,u\v,ueU  the  vector  \qv\veu 
as  follows,  for  all  u,  m  G  U: 


Qv 


if  u  G 

y  and  {v,u)  G  E; 

if  u  G 

y  and  {v,u)  ^  P; 

r 

[p{v. 

,u)  if  u  G 

y; 

fl 

if  V 

G  y  and  3wgT.  {v, 

u)  G  P; 

r 

if  V 

G  y  and  MuGI.  {v, 

u)  0P; 

^tP{v,u) 

if  V 

G  y. 

Then  x*,  as  a  fixpoint  of  (1),  is  a  solution  of  x  =  Px  +  q. 
Since  every  vertex  in  U  has  a  path  to  T  in  the  graph  (V  \ 
To,  T),  the  matrix  P  corresponds  to  a  transient  chain,  and 
det(7  —  P)  ^  0  [22].  Hence,  x*  =  (I  —  P)~^q  is  the 
unique  fixpoint  of  (1),  showing  the  optimality  of  the  pure 
memoryless  strategy  thus  constructed. 


5  Parity  Objectives 

Pure  memoryless  strategies  suffice  for  sure  winning  and 
optimality  (and  therefore  for  almost-sure  winning)  with  re¬ 
spect  to  parity  objectives. 

Theorem  6 

1.  The  family  of  pure  memoryless  strategies  suffices 

for  sure  winning  with  respect  to  parity  objectives  on 
2  ^l2-player  game  graphs. 

2.  [26,  5]  The  family  of  pure  memoryless  strategies 

suffices  for  optimality  with  respect  parity  objectives  on 
2  ^l2-player  game  graphs. 

Theorem  6(1)  for  2-player  games  is  a  classical  result 
of  [16];  an  alternative  proof  is  presented  in  [32].  Due  to 
Proposition  1,  the  result  carries  over  to  21/2-player  games. 
Theorem  6(2)  follows  from  two  independent  results:  an 
analytical  proof  using  results  on  recursive  games  of  Ev¬ 
erett  [17]  is  presented  in  [26];  a  combinatorial  proof  using 
graph-theoretic  arguments  is  presented  in  [5]. 


6  Rabin  Objectives 

Pure  memoryless  strategies  suffice  for  sure  winning  with 
respect  to  Rabin  objectives  in  1 1/2-player  and  2-player 


6 


games,  and  for  optimality  (and  therefore  for  almost-sure 
winning)  in  1 1/2-player  games  (MDPs).  It  is  an  open  prob¬ 
lem  whether  the  family  of  pure  memoryless  strate¬ 

gies  suffices  for  almost-sure  winning  on  2 1/2-player  game 
graphs. 

Theorem  7  The  family  of  pure  memoryless  strate¬ 

gies  suffices  for  sure  winning  with  respect  to  Rabin  objec¬ 
tives  on  2 1/2-pZflyer  game  graphs. 

Theorem  7  for  2-player  games  is  a  classical  result 
of  [16];  an  alternative  proof  is  presented  in  [15].  Due  to 
Proposition  1,  the  result  carries  over  to  2 1/2-player  games. 

It  follows  from  Theorem  4  and  Proposition  3  that  the 
family  Y^^  of  pure  finite-memory  strategies  suffices  for 
optimality  (and  almost-sure  winning)  with  respect  to  Ra¬ 
bin  objectives  on  2-player  game  graphs.  On  the  other  hand, 
pure  memoryless  strategies  suffice  for  optimality  with  re¬ 
spect  to  Rabin  objectives  on  MDPs,  as  stated  by  the  follow¬ 
ing  theorem.  This  result  does  not  follow  from  the  preceding 
results,  as  the  case  for  2 1/2-player  games  is  open,  as  noted 
above. 

Theorems  The  family  Y^^  of  pure  memoryless  strate¬ 
gies  sujfices  for  optimality  with  respect  to  Rabin  objectives 
on  1 1/2-player  game  graphs. 

This  theorem  can  be  proved  using  the  techniques  de¬ 
veloped  in  [8,  9]  to  compute  the  maximal  probability  of 
satisfying  an  w-regular  specification.  We  consider  player- 
1  MDPs  and  hence  strategies  for  player  1.  Let  G  = 
{{V,E),Vo,Vi,V2,p)  with  y2  =  0  be  a  1 1/2-player  game 
graph.  The  key  concept  underlying  the  proof  is  that  of 
an  end-component.  A  set  (7  C  y  of  vertices  is  an  end- 
component  if  U  is  p-closed  and  the  subgame  graph  G  [  U 
is  strongly  connected.  We  denote  hy  £  C  2^  the  set  of  all 
end-components  of  G. 

We  will  use  two  facts  about  end-components.  The  first 
fact  states  that,  under  any  strategy,  with  probability  1  the 
set  of  vertices  visited  infinitely  often  along  a  play  is  an 
end-component.  This  theorem  parallels  the  well-known 
property  of  closed  recurrent  classes  in  Markov  chains  [22]. 
To  state  the  lemma,  for  v  £  V  and  U  C  y,  we  define 
=  {oj€n^\  inf(w)  =  u}. 

Lemma  1  [8]  For  all  vertices  v  £  V  and  strategies  ct  €  S, 
we  have  Pr^(Uc/e£  = 

For  an  end-component  U  £  £,  we  denote  by  pjj  the  ran¬ 
domized  memoryless  strategy  that  at  each  vertex  t;  £Ur\Vx 
selects  uniformly  at  random  one  of  the  edges  {v,u)  £  E 
having  u  £  U.  The  following  lemma  is  immediate,  as  U  un¬ 
der  strategy  pjj  forms  a  closed  recurrent  class  of  a  Markov 
chain. 

Lemma  2  [8]  For  all  end-components  U  £  £  and  all  ver¬ 
tices  V  £U,  we  have  Pr^^  (flj/ )  =  1. 


Consider  a  set  i?  =  {{Ei,Fi), . . . ,  {Em,  Em)}  of  Rabin 
pairs.  For  convenience,  set  Ei  =  V  \  Ei  for  1  <  i  <  m. 
With  this  notation,  the  Rabin  objective  can  be  read  as  fol¬ 
lows:  a  play  is  winning  if  there  is  some  1  <  i  <  m  such 
that  (1)  the  play  is  eventually  confined  in  Ei,  and  (2)  the 
play  visits  Fi  infinitely  often.  We  denote  hy  U  C  £  the  set 
consisting  of  the  end-components  U  £  £  such  that  there  is 
an  1  <  i  <  m  for  which  U  C  Ei  and  U  Cl  Fi  7^  0.  The 
set  U  consists  thus  of  the  end-components  that  satisfy  the 
Rabin  objective.  Let  Te„d  =  Uc/eM  ^  union  of  all  such 
winning  end-components.  From  Lemmas  1  and  2  above,  it 
follows  that  the  maximal  probability  of  satisfying  Rabin(i?) 
is  equal  to  the  maximal  probability  of  reaching  the  union  of 
the  winning  end-components.  We  present  a  proof  of  this 
fact,  as  it  will  be  useful  in  the  construction  of  a  pure  mem¬ 
oryless  strategy. 

Lemma  3  [8]  ((l)).i,o/ Rabin (i?)  =  ((l)).;,o;Reach(Te„d). 

Proof.  Given  any  strategy  a,  let  a'  be  a  strategy  that 
behaves  like  a  outside  of  Tend,  and  that  upon  enter¬ 
ing  Tend  at  a  state  v,  follows  the  strategy  pjj,  for  some 
end-component  U  £  U  with  v  £  U .  Then,  from 
Lemma  2  it  follows  that  for  all  vertices  v  £  V,  we 
have  Pr"(Reach(Te„d))  =  Pr"  (Rabin(i?)),  and  thus, 
((l))j)a/Rabin(i?)(r;)  >  ((l))^o;Reach(Te„d)(t;).  For  the  re¬ 
verse  inequality,  consider  again  an  arbitrary  strategy  a,  and 
notice  that  from  Lemma  1  we  have: 

Pr^ (Rabin (i?))  =  ^  Pr^ (Rabin (i?)  |  •Pr:(D^) 

UGV 

=  ^Pr:(Rabin(ii;)  |D^)-Pr:(n^) 
ueu 

<  Y,  <  P<(Reach(Te„d)). 

ueu 


As  pure  memoryless  strategies  suffice  with  respect  to 
reachability  in  MDPs,  the  above  proof  is  a  first  step  in 
showing  that  there  are  pure  memoryless  optimal  strategies. 
However,  the  strategy  a'  constructed  above  is  not  necessar¬ 
ily  pure  memoryless,  because  it  needs  to  remember  one  of 
the  winning  end-components  (corresponding  to  the  entrance 
in  Tend),  and  b  follows  a  randomized  strategy  inside  that 
end-component.  We  can  construct  a  suitable  pure  memory¬ 
less  strategy  as  follows.  Let  U  =  {Ui, . . . ,  Uk},  thus  fix¬ 
ing  an  arbitrary  order  among  the  winning  end-components. 
For  1  <  j  <  k,  let  pair{j)  be  any  fixed  i  £  {1, . . .  ,to} 
such  that  Uj  C  Ei  and  Uj  fl  Fj  7^  0.  In  other  words, 
{Epair{j) ,  Fpair{j))  €  i?  is  a  Rabin  pair  that  witnesses  the 
winning  of  the  end-component  Uj.  With  this  notation,  for 
1  <  i  <  let  dj  be  the  pure  memoryless  strategy  defined 
over  Uj  which  chooses  only  successors  in  Uj  such  that: 
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•  in  Uj  \  Fpair{j),  it  coincides  with  a  pure  memoryless 
strategy  for  reaching  Fpair(j)  \ 

•  in  Fpair{j),  it  chooses  an  arbitrary  destination  in  Uj. 

The  existence  of  such  a  strategy  follows  from  the  existence 
of  pure  memoryless  strategies  with  respect  to  reachability 
(Theorems).  For  u  g  Te„d,  let  rank{v)  =  max{l  <3  < 
k  I  V  g  Uj}  he  the  rank  of  the  vertex  v.  Now  define  the 
strategy  a  as  follows: 

•  outside  Te„d,  the  strategy  &  coincides  with  a  pure 
memoryless  optimal  strategy  with  respect  to  the  ob¬ 
jective  Reach(Te„(j); 

•  at  each  vertex  v  g  Te„d,  the  strategy  &  coincides 

with  . 

Once  such  a  memoryless  strategy  is  fixed,  the  MDP  be¬ 
comes  a  Markov  chain  MCa,  with  transition  probabili¬ 
ties  defined  by  for  m  g  Fi,  and  by 

Pu,v  =  for  u  g  Vo-  The  following  lemma  char¬ 

acterizes  the  closed  recurrent  classes  of  this  Markov  chain 
in  the  set  Tg^d,  stating  that  they  satisfy  the  Rabin  objective. 

Lemma  4  If  C  is  a  closed  recurrent  class  of  the  Markov 
chain  MC d  with  C  H  Tend  ^  0.  then  there  is  an  1  <  i  <  m 
such  that  C  U  Ei  and  C  H  Fi  0. 

Proof.  Let  E'  =  {{u,v)  g  |  Pu,v  >  0}.  The  closed 
recurrent  classes  of  MCd  are  the  terminal  strongly  con¬ 
nected  components  (SCCs)  of  the  graph  {Tend,  E').  By  the 
construction  of  a,  the  rank  of  the  vertices  along  all  paths  in 
{Tend,  E')  is  nondecreasing.  Hence,  each  terminal  SCC  C 
of  {Tend,E')  must  consist  of  vertices  with  the  same  rank; 
we  indicate  this  rank  by  rank{C).  Then,  at  all  states  of 
C  the  strategy  drankiC)  used.  Thus,  it  immediately  fol¬ 
lows  that  C  C  Urank(C)-  Moreover,  since  from  every  state 
of  Uj.ank{C)  \  Fpair{rank{C))  Strategy  d'nank{C)  aims  at 
reaching  Fpair(rank(c)),  and  as  C  has  no  outgoing  edges 
in  E',  it  follows  that  C  n  Fpair(rank(C))  7^  0-  ■ 

The  optimality  of  the  strategy  ct  is  a  simple  consequence 
of  Lemma  4. 

Corollary  1  For  all  vertices  v  g  V,  we  have 
((l))'!)atRabin(i?)(t;)  =  Pr^(Rabin(i?)). 

Proof.  In  view  of  Lemma  3,  we  need  to  show  that 
((l))j)atReach(Te„d)(t^)  =  Pr^ (Rabin (i?)).  To  this 
end,  it  suffices  to  note  that  by  the  construction  of  d, 
we  have  ((l))^o(Reach(Te„d)  =  Pr^(Reach(Te„d))  and 
Pr^(Rabin(i?)  |  Reach(Te„(j))  =  1.  The  second  equality 
follows  from  the  fact  that  under  strategy  d,  once  a  play  w  en¬ 
ters  Tend,  with  probability  1  we  have  Inf(w)  =  C  for  some 
closed  recurrent  class  C  of  MC d-  Lemma  4  then  leads  to 
the  conclusion.  I 
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Figure  2.  A  Streett  game. 

7  Streett  Objectives 

Sure  winning  requires  memory  for  Streett  objectives 
even  in  the  case  of  1 -player  games.  This  follows  from  the 
example  given  in  the  introduction,  which  is  repeated  here. 

Example  2  Consider  the  1 -player  game  graph  shown  in 
Fig.  2.  The  objective  is  a  Streett  objective  with  two 
Streett  pairs:  S  =  Fi),  (E2,  F2)}  for  Fi  =  E2  = 

{t;o, t^i, t;2}  and  Ei  =  {r;i}  and  E2  =  {^^2}-  We  con¬ 
sider  the  two  possible  pure  memoryless  strategies:  (1)  for 
the  strategy  that  always  chooses  r;o  — 'Ui,  the  Streett  pair 
(£^2,^2)  is  not  satisfied;  and  (2)  for  the  strategy  that  always 
chooses  Vo  —>  V2,  the  Streett  pair  {Ei,Ei)  is  not  satisfied. 
Hence  there  is  no  pure  memoryless  sure  winning  strategy 
for  player  1 .  It  follows  from  Proposition  2  that  there  is  no 
randomized  memoryless  sure  winning  strategy  either.  I 

It  will  follow  from  Theorem  10  that  memoryless  strate¬ 
gies  suffice  for  almost-sure  winning  with  respect  to  Streett 
objectives  on  1 1/2-player  (and  hence  on  1-player)  game 
graphs.  We  now  show  that  almost-sure  winning  2-player 
Streett  games  does  require  memory. 

Example  3  Consider  the  2-player  game  graph  shown  in 
Fig.  3.  The  objective  is  a  Streett  objective  with  two  Streett 
pairs:  S  =  {(£1, Fi),  (£2, £2)}  for  £1  =  {v2,Vi},  £2  = 
{t;3},  Fi  =  {r’a},  and  £2  =  {r’4}.  Consider  the  two  pos¬ 
sible  pure  memoryless  strategies  for  player  1:  (1)  for  the 
player- 1  strategy  that  always  chooses  vq  — >  wi ,  the  player-2 
strategy  that  chooses  Vi  — >•  V3  ensures  that  the  Streett  pair 
{El,  El)  is  not  satisfied;  and  (2)  for  the  player- 1  strategy 
that  always  chooses  vq  — >  V4,  the  Streett  pair  {E2,F2)  is 
not  satisfied.  For  any  randomized  memoryless  strategy  that 
chooses  both  vq  — >  Vi  and  vq  V4  with  positive  proba¬ 
bilities,  the  player-2  strategy  that  chooses  Vi  —>  V2  ensures 
that  the  vertex  set  {fo ,  f  1 ,  f 2 ,  f 4  }  is  visited  infinitely  often. 
Hence  the  Streett  pair  (£2,  F2)  is  not  satisfied.  Note,  how¬ 
ever,  that  the  pure  memory  strategy  that  chooses  wq  — >  '04 
once  whenever  player  2  chooses  vi  V3,  and  otherwise 
chooses  Vo  — >  tti,  is  a  sure  winning  strategy  (and  hence 
also  an  almost-sure  winning  strategy)  for  player  1 . 1 

The  results  on  Streett  games  are  summarized  in  the  fol¬ 
lowing  theorem. 

Theorem  9 

1.  The  family  of  memoryless  strategies  does  not  suf¬ 

fice  for  sure  winning  with  respect  to  Streett  objectives 
on  1-player  game  graphs. 


2.  The  family  of  memoryless  strategies  suffices  for 
almost-sure  winning  with  respect  to  Streett  objectives 
on  1  '^l2-player  game  graphs. 

3.  The  family  of  memoryless  strategies  does  not  suf¬ 
fice  for  almost-sure  winning  with  respect  to  Streett  ob¬ 
jectives  on  2-player  game  graphs. 

8  Muller  Objectives 

It  follows  from  Example  2  that  sure  winning  strate¬ 
gies  require  memory  for  Muller  objectives  even  in  1-player 
games.  Moreover,  Example  3  shows  that  in  2-player  games 
with  Muller  objectives  almost-sure  winning  requires  mem¬ 
ory.  We  now  show  that  for  1 1/2-player  Muller  games  mem¬ 
oryless  almost-sure  winning  strategies  exist. 

Theorem  10  The  family  of  memoryless  strategies  suf¬ 

fices  for  optimality  with  respect  to  MUller  objectives  on 
1 1/2-plflyer  game  graphs. 

Given  a  set  Mr  C  2^  of  Muller  sets,  we  denote  by 
U  =  £r\  Mr  the  set  of  end-components  that  are  Muller  sets 
(see  Section  6  for  a  definition  of  end-components);  these 
are  the  winning  end-components.  Let  Te„d  =  Uc/ew  ^ 
their  union.  Erom  Lemmas  1  and  2,  it  follows  that  the  max¬ 
imal  probability  of  satisfying  the  objective  Muller(Mr)  is 
equal  to  the  maximal  probability  of  reaching  the  union  of 
the  winning  end-components. 

Lemmas  ((l))^otMuller(Mr)  =  ((l))^o/Reach(Te„d)- 

The  proof  of  this  lemma  is  analogous  to  the  proof  of 
Lemma  3.  To  construct  a  memoryless  winning  strategy, 
we  again  let  W  =  {Ui, . . .  ,Uk},  thus  fixing  an  arbitrary  or¬ 
der  among  the  winning  end-components,  and  we  define  the 
rank  of  a  vertex  v  G  Tend  by  rank{v)  =  max{l  <  j  <  k\ 
V  G  Uj}.  We  define  a  randomized  memoryless  strategy  p  as 
follows: 

•  In  y  \  Tend7  the  strategy  p  coincides  with  an  optimal 
memoryless  strategy  to  reach  Tend- 

•  At  each  vertex  v  G  Tend  El  Vi,  the  strategy  p  coin¬ 
cides  with  the  strategy  pu„nk(.-v)  defined  in  Section  6; 
that  is,  it  selects  uniformly  at  random  one  of  the  edges 
{v,  u)  £  E  having  U  G  Urank(v)- 

Once  such  a  memoryless  strategy  is  fixed,  the  MDP  be¬ 
comes  a  Markov  chain  MC p,  with  transition  probabilities 
defined  by  =  Ku){v)  for  u  £  Vi,  and  by  = 
p{u,v)  for  u  £  Vq.  The  following  lemma  characterizes 
the  closed  recurrent  classes  of  this  Markov  chain  in  the 
set  Tend 7  stating  that  they  are  all  winning  end-components. 

Lemma  6  If  C  is  a  closed  recurrent  class  of  the  Markov 
chain  MC p,  then  either  C  fl  Tend  =%  or  C  £U. 


Proof.  LetE'  =  {(u,u)  G  T‘^end  I  Pu,v  >  0}.  The 
closed  recurrent  classes  of  MCp  are  the  terminal  SCCs  of 
the  graph  {Tend 7  E').  As  the  rank  of  the  vertices  along  all 
paths  in  {Tend  7  E')  is  nondecreasing,  each  terminal  SCC  C 
of  {Tend  7  E)  must  cousist  of  vertices  with  the  same  rank, 
denoted  rank{C).  Clearly,  C  C  Urank(C)-  To  see  that 
C  =  Urank(C)  note  that  in  C  player  1  follows  the  strategy 
PUrank(.c)7  wWch  causes  the  whole  of  Urank{c)  to  be  visited. 
Hence,  as  C  is  terminal,  we  have  C  =  Urank{c)-  ® 

The  optimality  of  the  strategy  p  is  a  simple  consequence 
of  Lemma  6.  The  following  corollary  is  proved  in  a  fashion 
analogous  to  Corollary  1 . 

Corollary  2  For  all  vertices  v  £  V,  we  have 
((l))j)a/Muller(Mr)(u)  =  Pr^  (Muller  (Mr)). 

9  Upward-closed  Objectives 

We  show  that  memoryless  almost-sure  winning  strate¬ 
gies  exist  for  all  2  t/2-player  Muller  games  if  the  objective 
can  be  specified  in  an  upward-closed  way. 

Theorem  11  The  family  of  memoryless  strategies  suf¬ 
fices  for  almost-sure  winning  on  2  ^l2-player  game  graphs 
with  respect  to  Muller  objectives  that  have  upward-closed 
specifications. 

Proof.  Consider  an  upward-closed  specification  L  of  an 
objective  Muller(Mr)  and  a  21/2-player  game  graph  G  = 
((y,  £^),  Vo,  y,  y2,p).  Let  Wi  C  y  be  the  almost-sure 
winning  set  for  player  1.  It  is  easy  to  argue  that  for  ev¬ 
ery  vertex  u  £  Wi  Cl  Vi,  there  is  a  vertex  v  £  Wi  with 
{u,v)  £  E,  and  for  every  vertex  u  £  Wi  fl  (Vb  U  y2),  for 
all  edges  {u,v)  £  E  we  have  v  £  Wi.  Hence,  G  \  Wi 
is  a  subgame  graph.  By  the  definition  of  Wi,  player  1 
has  a  winning  strategy  a^,  (memoryless  or  not)  such  that 
Py”’'^(Muller(Mr))  =  1  for  all  vertices  v  £  Wi  and 
player-2  strategies  tt.  Moreover,  the  strategy  can  choose 
only  edges  in  G  \  Wi,  as  it  cannot  leave  Wi.  Hence,  from 
now  on  we  concentrate  on  the  subgame  graph  G  \  Wi . 

Let  (7  be  the  memoryless  player- 1  strategy  that  plays 
uniformly  at  random  in  G  \  Wi.  Precisely,  for  a  vertex 
u  £  Wi  n  y,  let  £;„  =  {(u,^)  G  I  u  G  Wi},  and  let  d 
be  the  player- 1  strategy  that  at  u  G  Wi  ny  plays  each  edge 
in  Eu  uniformly  at  random.  We  claim  that  a  is  winning, 
that  is,  Pr"’"(Muller(Mr))  =  1  for  all  vertices  v  £  Wi 
and  player-2  strategies  tt,  thus  showing  the  existence  of  a 
memoryless  almost- winning  strategy  for  player  1 . 

Assume,  towards  a  contradiction,  that  player  2  has  a 
strategy  tTw  such  that  (Muller (Mr))  <  1  for  some 

vertex  v  £  Wi .  Note  that  G  \  Wi  is  a  player-2  MDP  under 
strategy  it;  we  denote  this  player-2  MDP  by  {G  \  Wi)^. 
Prom  our  results  on  Muller  MDPs,  there  must  be  an  end- 
component  A2  C  Wi  of  {G  I  Wi)ct  which  is  winning  for 
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Figure  3.  A  Muller  game. 


player  2,  that  is,  [^2]  ^  T.  Moreover,  player  2  has  a  memo¬ 
ryless  strategy  n  that  enables  it  to  win  with  maximal  prob¬ 
ability  in  (G  I  Wi)^,  and  A2  is  a  closed  recurrent  class  of 
the  Markov  chain  (G  f 

Now  consider  the  situation  arising  when  player  1  uses  its 
original  winning  strategy  <t^  against  n.  Under  strategy  w, 
the  game  graph  G  I  Wi  is  a  player- 1  MDP,  which  we  de¬ 
note  by  (G  f  .  As  A2  is  closed  under  a  and  tt,  it  has 
no  outgoing  player- 1  edge  in  (G  f  fUi)#.  By  the  definitions 
of  aw  and  A2,  player  1  can  win  with  probability  1  from  A2. 
Therefore,  again  from  our  results  on  Muller  MDPs,  there 
must  be  an  end-component  Ai  C  A2  of  (G  (  ^1)*  which 
is  winning  for  player  1,  that  is,  [Ai]  G  T.  This  contradicts 
the  upward-closure  of  T.  I 

There  are  games  with  Muller  objectives  such  that  sure 
winning  with  a  pure  strategy  requires  0(n)  memory,  where 
n  is  the  number  of  vertices,  but  almost-sure  winning  can 
be  achieved  by  a  randomized  memoryless  strategy.  To  see 
this,  for  arbitrary  n  >  0,  consider  the  set  C  =  {ci, . . . ,  c„} 
of  colors  and  the  Muller  specification  T  =  {C}.  It  fol¬ 
lows  from  the  split-tree  construction  of  [15]  that  there  is  a 
2-player  game  graph  G„  with  n  vertices,  each  of  which  is 
labeled  by  a  unique  color  from  G,  such  that  a  pure  sure  win¬ 
ning  strategy  on  Gn  for  the  objective  Muller(Mr)  requires 
0{n)  memory.  On  the  other  hand,  since  T  is  upward-closed, 
by  Theorem  11a  randomized  memoryless  almost-sure  win¬ 
ning  strategy  exists. 

We  now  show  that  there  exists  a  2-player  game  graph 
such  that  for  every  Muller  objective  that  is  not  upward- 
closed,  randomization  does  not  help,  i.e.,  memoryless 
almost-sure  winning  strategies  exist  if  pure  memoryless 
almost-sure  winning  strategies  exist,  whereas  strategies 
with  memory  may  be  almost-sure  winning. 

Example  4  Let  (7  be  a  set  of  colors,  and  let  T  be  a  speci¬ 
fication  of  a  Muller  objective  over  G  which  is  not  upward- 
closed.  Let  a  C  /3  C  C  such  that  a  G  T  and  /?  0  T  witness 
that  r  is  not  upward-closed.  Consider  the  2-player  game 
graph  shown  in  Fig.  3,  where  the  □  vertices  are  the  player- 1 
vertices,  and  the  O  vertices  are  the  player-2  vertices.  The 
colors  of  each  vertex  are  defined  by  \v2\  =  a,  [r’4]  =  /?,  and 
[no]  =  [ni]  =  [ns]  =  0. 

We  show  that  every  memoryless  strategy  that  is  not  pure 
is  not  an  almost-sure  winning  strategy.  Consider  the  ran¬ 
domized  memoryless  strategy  a  for  player  1  which  plays  at 


no  both  edges  no  — >■  ni  and  no  — >■  n4  with  positive  prob¬ 
ability.  Let  TT  be  the  strategy  for  player  2  which  chooses 
ni  — >  V2  at  ni.  Given  the  strategies  a  and  tt,  the  game 
is  a  Markov  chain  and  the  vertex  set  {no, ni, n2, n4}  is  a 
closed  recurrent  class  of  the  Markov  chain;  hence  it  is  vis¬ 
ited  infinitely  often.  Thus,  the  set  of  colors  that  are  visited 
infinitely  often  is  a  U  /3  =  /3,  because  a  C  /3.  Since  ^  T, 
there  is  no  randomized  memoryless  almost-sure  winning 
strategy. 

We  now  show  that  on  the  game  graph  of  Fig.  3,  for  every 
set  J  C  G,  if  ,0  C  J  and  J  G  F,  then  almost-sure  win¬ 
ning  strategies  exist  for  player  1.  The  vertex  colors  are 
now  defined  by  [n2]  =  a,  [n4]  =  /3,  [na]  =  5  \  and 
[vo]  =  [ni]  =  0.  We  construct  a  sure  winning  strategy  (and 
hence  an  almost-sure  winning  strategy)  that  uses  memory. 
Consider  the  following  strategy  a  for  player  1 :  given  any 
sequence  of  vertices  w  €  V*,  let 

I  III  if  the  last  vertex  of  w  is  not  V3 ; 
a{w  ■Vo)  =  <  . 

I  V4  otherwise. 

Intuitively,  the  strategy  a  can  be  described  as  follows:  if  at 
vertex  Vi  the  edge  Vi  — >•  V2  is  played,  then  player  1  plays 
Vo  — >  Vi  at  Vo;  if  at  vertex  Vi  the  edge  Vi  —>■  V3  is  played, 
then  player  1  chooses  wq  — >  V4  followed  by  wq  — >  t;i .  We 
prove  that  ct  is  a  sure  winning  strategy  for  player  1  by  con¬ 
sidering  the  following  three  cases: 

1 .  For  every  play  ui  such  that  Vi  — >  V2  occurs  infinitely 
often  and  vi  — >  ^3  occurs  finitely  often,  we  have 
Inf(a;)  =  {r’o, ft, ^2}  [Iiif(a;)]  =  a  G  F. 

2.  For  every  play  w  such  that  vi  — >  V3  occurs  infinitely 
often  and  vi  — >  V2  occurs  finitely  often,  we  have 
Inf(w)  =  {t;o,'t^i,'t^3,'y4}  and  [Inf(w)]  =  /3u((5\/3)  = 
d£T. 

3.  For  every  play  w  such  that  vi  — >  V3  occurs  infinitely 

often  and  Vi  — >•  V2  occurs  infinitely  often,  we  have 
Inf(w)  =  and  [Inf(w)]  =  a  U  ,0  U 

(5  \  ,0)  =  ,0  U  (J  \  ,0)  =  5  G  F,  because  a  C  p. 

Since  a,  6  G  F,  it  follows  that  ct  is  a  sure  winning  strategy. 

I 

The  following  example  shows  that  sure  winning  may  re¬ 
quire  memory  for  1-player  games  with  upward-closed  ob¬ 
jectives.  It  follows  that  Theorem  1 1  cannot  be  strengthened 
to  sure  winning  strategies. 

Example  5  Recall  the  1-player  game  graph  shown  in 
Fig.  2.  The  set  of  colors  is  C  =  {ci,C2},  the  vertex  vi  is 
labeled  with  color  Ci,  and  V2  is  labeled  with  C2.  The  specifi¬ 
cation  of  the  Muller  objective  is  F  =  {{ci,  C2}};  that  is,  the 
objective  of  the  player  is  to  visit  both  Vi  and  V2  infinitely  of¬ 
ten.  We  have  already  seen  that  there  is  no  pure  memoryless 
sure  or  almost-sure  strategy  to  achieve  this  objective.  Note, 
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Table  1.  AS  -  Almost  Sure,  PM  -  Pure  Memoryless,  F  -  Finite  Memory,  RM  -  Randomized  Memoryless. 


Parity 

Rabin 

Streett 

Muller 

Upward-closed 

Players 

Sure 

Optimal 

Sure 

Optimal 

Sure 

Optimal 

Sure 

Optimal 

Sure 

AS 

21/2 

PM 

PM 

PM 

E 

E 

E 

E 

E 

E 

RM 

2 

PM 

PM 

PM 

PM 

E 

E 

E 

E 

E 

RM 

IV2 

PM 

PM 

PM 

PM 

E 

RM 

E 

RM 

E 

RM 

1 

PM 

PM 

PM 

PM 

E 

RM 

E 

RM 

E 

RM 

however,  that  a  strategy  that  alternately  chooses  between 
fo  vi  and  iiq  — >  ^2  is  ^  sure  winning  strategy.  Now  con¬ 
sider  the  randomized  memoryless  strategy  that  chooses 
the  edges  Vq  — >  Vi  and  vq  —>  V2  each  with  probability  1/2- 
Then,  with  probability  1  all  vertices  are  visited  infinitely 
often.  Thus  is  an  almost-sure  winning  strategy.  I 

10  Conclusion 

The  memory  and  randomization  requirements  of  sure 
winning  and  optimal  (or  almost-sure  winning)  strategies 
for  21/2-,  2-,  11/2-,  and  1-player  game  graphs  are  summa¬ 
rized  in  Table  1 .  We  showed  that  in  2 1/2-player  games  with 
upward-closed  objectives  randomized  memoryless  almost- 
sure  winning  strategies  exist.  Moreover,  the  randomized 
memoryless  strategies  are  always  simple,  in  the  sense  that 
they  use  only  uniform  randomization  over  given  sets  of 
edges.  Several  important  classes  of  Muller  objectives,  such 
as  generalized  Biichi  objectives,  are  upward-closed.  In  par¬ 
ticular,  in  2-player  games  with  generalized  Biichi  objectives 
the  classical  pure  sure  winning  strategies  require  memory, 
but  randomized  memoryless  optimal  strategies  exist. 

In  the  case  of  21/2-player  games  with  parity  objectives 
pure  memoryless  sure  winning,  almost-sure  winning,  and 
optimal  strategies  exist  [4,  5] .  It  is  an  open  problem  whether 
pure  memoryless  almost-sure  winning  strategies  exist  for 
21/2-player  games  with  Rabin  objectives.  We  also  leave 
open  the  problem  whether  memoryless  optimal  strategies 
exist  21/2-player  games  with  upward-closed  objectives. 

We  considered  turn-based  games,  where  at  each  (non- 
probabilistic)  vertex  one  of  the  two  players  chooses  a  suc¬ 
cessor  vertex.  A  more  general  class  of  games  are  the  con¬ 
current  games,  where  at  each  vertex  both  players  simul¬ 
taneously  and  independently  choose  moves,  and  the  com¬ 
bination  of  the  chosen  moves  results  either  deterministi¬ 
cally  or  probabilistically  in  a  specific  successor  vertex.  The 
following  results  are  known  for  concurrent  games  [10]: 
memoryless  strategies  suffice  for  optimality  with  respect  to 
safety  objectives;  memoryless  strategies  suffice  for  optimal¬ 
ity  with  respect  to  reachability  objectives  only  in  the  limit; 
and  Biichi  objectives  require  both  infinite  memory  and  ran¬ 
domization  for  almost-sure  winning.  In  the  case  of  concur¬ 
rent  games,  sure  winning  is  always  simpler  than  almost-sure 


winning,  in  terms  of  the  requirements  of  winning  strategies. 
In  contrast,  for  MDPs  with  Muller  objectives  sure  winning 
strategies  require  memory  but  memoryless  strategies  suffice 
for  almost-sure  winning. 
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